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is used sticks in your mind. 
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Who ts this course for 


University students 


IT Developers from other disciplines 


AWS/ GCP/ On-prem Data Engineers 
Data Architects 


Data Scientists 


Who is this course for 


Your main focus is not learning Azure Data Factory 
You are not interested in hands-on learning approach 
Your only focus is Azure Data Engineering Certification 


Pre-requisites 


No prior Knowledge assumed 


cloud fundamentals would be beneficial, not necessary 
Basic knowledge on SQL would be beneficial, not necessary 
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Azure Data Factory Overview 


What is Azure Data Factory 


A fully managed, serverless data integration solution for ingesting, 
preparing and transforming all of your data at scale. 


The Data Problem 
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What is Azure Data Factory 


Data Orchestration Service 


A fully managed, serverless data integration solution for ingesting, preparing and 
transforming all of your data at scale. 


What Azure Data Factory Is Not 


Data Migration Tool 


Data Streaming Service 


Suitable for Complex Data Transformations 
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Reporting 


Data Lake 


Data Lake to be built with the following data to aid 
Data Scientists to predict the spread of the virus/ 
mortality 


i Confirmed cases 

| Mortality 

i Hospitalization/ ICU Cases 
li Testing Numbers 


i Country's population by age group 


Data Warehouse 


Data Warehouse to be built with the following data 
to aid Reporting on Trends 
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Storage Solutions 


Key Factors to Consider 


Structure of the data Structured 


Semi-Structured 


Unstructured 


Operational need: 


Azure Databases 
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Azure Database for PostgreSQL 
Azure Database for MariaDB 
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Ingest "population by age” for all EU 
Countries into the Data Lake to support the 
machine learning models to predict increase 
in Covid-19 mortality rates 
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Storage Account: covidreportingdl 
Container: raw 
File: population/population_by_age.tsv 
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Scenario 1 


Execute Copy Activity when the file becomes available 


Scenario 2 


Execute Copy Activity only if file contents are as expected 


Scenario 3 


Delete the source file on successful copy 
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Can only be scheduled for a future time to start 
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Runs at periodic intervals 


Windows are fixed sized, non-overlapping 
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Trigger to Pipeline is one to one 


Runs In response to events 


Events can be creation or deletion of Blobs/ 
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Pipeline Variables 


Pipeline Parameters 


Lookup Activity 
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Linked Service Parameters 


Metadata driven pipeline 


Recent Changes to ECDC Data 


Recent Changes to ECDC Data 


Download COVID-19 datasets 
BOE 


ECDC switched to a weekly reporting schedule for the COVID-19 situation worldwide and in the EU/EEA 
and the UK on 17 December 2020. Hence, all daily updates have been discontinued from 14 

December. ECDC will publish updates on the number of cases and deaths reported worldwide and 
aggregated by week every Thursday. The weekly data will be available as downloadable files in the 
following formats: XLSX, CSV, JSON and XML. As an exception, the weekly updates for the end-of-year 
festive season will be published on 23 December and 30 December 2020. 


With the switch from daily to weekly reporting, ECDC will shift its Epidemic Intelligence (El) resources from case 
counting to signal/event detection and resume its regular El activities, which will include COVID-19 signal and 
event detection and analysis but also other potential threats. 


Granularity of the data changed from daily to weekly 


File structure is also different as a result 


Use GIT Repo - https://github.com/cloudboxacademy/covid19 
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Data Ingestion Requirements 


Covid-19 new cases and deaths by Country 
Covid-19 Hospital admissions & ICU cases 


Covid-19 Testing Numbers 


Country Response to Covid-19 


URL - https:/www.ecdc.europa.eu/en/covid-19/data 
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Case á Deaths Data 


URL - https://www.ecdc.europa.eu/en/publications-data/data-national-14-day-notification-rate-covid-19 
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https://opendata.ecdc.europa.eu/covid19/hospitalicuadmission Container: raw 


rates/csv/data.csv File: ecdc/hospital admissions.csv 


Parameters & Variables 


Parameters are external values passed into pipelines, datasets or linked 
services. The value cannot be changed inside a pipeline. 


Variables are internal values set inside a pipeline. The value can be changed 
inside the pipeline using Set Variable or Append Variable Activity 
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Features 


Code free data transformations 


Executed on Data Factory managed 
Databricks Spark clusters 


Benefits from Data factory scheduling and 
monitoring capabilities. 
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Code free data transformation at scale 
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Wrangling Data Flow (Preview) 


Code free data preparation at scale 
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Limitations 


Only available in some regions 


https://docs.microsoft.com/en-us/azure/data-factory/concepts-data-flow- 
overview#available-regions 


Limited set of connectors available 


https://docs.microsoft.com/en-us/azure/data-factory/data-flow-source#supported- 
sources 
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